{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python module & CLI\n",
    "\n",
    "> Read llms.txt files and create XML context documents for LLMs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#| hide\n",
    "from fastcore.utils import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given an `llms.txt` file, this provides a CLI and Python API to parse the file and create an XML context file from it. The input file should follow this format:\n",
    "\n",
    "```\n",
    "# FastHTML\n",
    "\n",
    "> FastHTML is a python library which...\n",
    "\n",
    "When writing FastHTML apps remember to:\n",
    "\n",
    "- Thing to remember\n",
    "\n",
    "## Docs\n",
    "\n",
    "- [Surreal](https://host/README.md): Tiny jQuery alternative with Locality of Behavior\n",
    "- [FastHTML quick start](https://host/quickstart.html.md): An overview of FastHTML features\n",
    "\n",
    "## Examples\n",
    "\n",
    "- [Todo app](https://host/adv_app.py)\n",
    "\n",
    "## Optional\n",
    "\n",
    "- [Starlette docs](https://host/starlette-sml.md): A subset of the Starlette docs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```sh\n",
    "pip install llms-txt\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to use"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CLI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After installation, `llms_txt2ctx` is available in your terminal.\n",
    "\n",
    "To get help for the CLI:\n",
    "\n",
    "```sh\n",
    "llms_txt2ctx -h\n",
    "```\n",
    "\n",
    "To convert an `llms.txt` file to XML context and save to `llms.md`:\n",
    "\n",
    "```sh\n",
    "llms_txt2ctx llms.txt > llms.md\n",
    "```\n",
    "\n",
    "Pass `--optional True` to add the 'optional' section of the input file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python module"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llms_txt import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samp = Path('llms-sample.txt').read_text()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `parse_llms_file` to create a data structure with the sections of an llms.txt file (you can also add `optional=True` if needed):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['title', 'summary', 'info', 'sections']"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parsed = parse_llms_file(samp)\n",
    "list(parsed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('FastHTML',\n",
       " 'FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore\\'s `FT` \"FastTags\" into a library for creating server-rendered hypermedia applications.')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parsed.title,parsed.summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Docs', 'Examples', 'Optional']"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(parsed.sections)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```json\n",
       "{ 'desc': 'A subset of the Starlette documentation useful for FastHTML '\n",
       "          'development.',\n",
       "  'title': 'Starlette full documentation',\n",
       "  'url': 'https://gist.githubusercontent.com/jph00/809e4a4808d4510be0e3dc9565e9cbd3/raw/9b717589ca44cedc8aaf00b2b8cacef922964c0f/starlette-sml.md'}\n",
       "```"
      ],
      "text/plain": [
       "{'title': 'Starlette full documentation',\n",
       " 'url': 'https://gist.githubusercontent.com/jph00/809e4a4808d4510be0e3dc9565e9cbd3/raw/9b717589ca44cedc8aaf00b2b8cacef922964c0f/starlette-sml.md',\n",
       " 'desc': 'A subset of the Starlette documentation useful for FastHTML development.'}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parsed.sections.Optional[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `create_ctx` to create an LLM context file with XML sections, suitable for systems such as Claude (this is what the CLI calls behind the scenes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ctx = create_ctx(samp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<project title=\"FastHTML\" summary='FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore&#39;s `FT` \"FastTags\" into a library for creating server-rendered hypermedia applications.'>\n",
      "Remember:\n",
      "\n",
      "- Use `serve()` for running uvicorn (`if __name__ == \"__main__\"` is not\n"
     ]
    }
   ],
   "source": [
    "print(ctx[:300])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementation and tests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To show how simple it is to parse `llms.txt` files, here's a complete parser in <20 lines of code with no dependencies:\n",
    "\n",
    "```python\n",
    "from pathlib import Path\n",
    "import re,itertools\n",
    "\n",
    "def chunked(it, chunk_sz):\n",
    "    it = iter(it)\n",
    "    return iter(lambda: list(itertools.islice(it, chunk_sz)), [])\n",
    "\n",
    "def parse_llms_txt(txt):\n",
    "    \"Parse llms.txt file contents in `txt` to a `dict`\"\n",
    "    def _p(links):\n",
    "        link_pat = '-\\s*\\[(?P<title>[^\\]]+)\\]\\((?P<url>[^\\)]+)\\)(?::\\s*(?P<desc>.*))?'\n",
    "        return [re.search(link_pat, l).groupdict()\n",
    "                for l in re.split(r'\\n+', links.strip()) if l.strip()]\n",
    "\n",
    "    start,*rest = re.split(fr'^##\\s*(.*?$)', txt, flags=re.MULTILINE)\n",
    "    sects = {k: _p(v) for k,v in dict(chunked(rest, 2)).items()}\n",
    "    pat = '^#\\s*(?P<title>.+?$)\\n+(?:^>\\s*(?P<summary>.+?$)$)?\\n+(?P<info>.*)'\n",
    "    d = re.search(pat, start.strip(), (re.MULTILINE|re.DOTALL)).groupdict()\n",
    "    d['sections'] = sects\n",
    "    return d\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have provided a test suite in `tests/test-parse.py` and confirmed that this implementation passes all tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
